Search CORE

276 research outputs found

An exploratory data analysis method to reveal modular latent structures in high-throughput data

Author: Yu Tianwei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations. Results We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes. Conclusions Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at <url>http://userwww.service.emory.edu/~tyu8/MLSA/</url>.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Capturing changes in gene expression dynamics by gene set differential coordination analysis

Author: Bai Yun
Yu Tianwei
Publication venue
Publication date: 01/01/2011
Field of study

Analyzing gene expression data at the gene set level greatly improves feature extraction and data interpretation. Currently most efforts in gene set analysis are focused on differential expression analysis - finding gene sets whose genes show first-order relationship with the clinical outcome. However the regulation of the biological system is complex, and much of the change in gene expression dynamics do not manifest in the form of differential expression. At the gene set level, capturing the change in expression dynamics is difficult due to the complexity and heterogeneity of the gene sets. Here we report a systematic approach to detect gene sets that show differential coordination patterns with the rest of the transcriptome, as well as pairs of gene sets that are differentially coordinated with each other. We demonstrate that the method can identify biologically relevant gene sets, many of which do not show first-order relationship with the clinical outcome

Elsevier - Publisher Connector

PubMed Central

Philadelphia College of Osteopathic Medicine: DigitalCommons@PCOM

Improving gene expression data interpretation by finding latent factors that co-regulate gene modules with clinical factors

Author: Bai Yun
Yu Tianwei
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In the analysis of high-throughput data with a clinical outcome, researchers mostly focus on genes/proteins that show first-order relations with the clinical outcome. While this approach yields biomarkers and biological mechanisms that are easily interpretable, it may miss information that is important to the understanding of disease mechanism and/or treatment response. Here we test the hypothesis that unobserved factors can be mobilized by the living system to coordinate the response to the clinical factors. Results We developed a computational method named Guided Latent Factor Discovery (GLFD) to identify hidden factors that act in combination with the observed clinical factors to control gene modules. In simulation studies, the method recovered masked factors effectively. Using real microarray data, we demonstrate that the method identifies latent factors that are biologically relevant, and extracts more information than analyzing only the first-order response to the clinical outcome. Conclusions Finding latent factors using GLFD brings extra insight into the mechanisms of the disease/drug response. The R code of the method is available at <url>http://userwww.service.emory.edu/~tyu8/GLFD</url>.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Philadelphia College of Osteopathic Medicine: DigitalCommons@PCOM

An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs.

Author: Kim Joseph
Lee Christopher
Roy Meenakshi
Wu Ying Nian
Xing Yi
Yu Tianwei
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms

PubMed Central

eScholarship - University of California